to go along with
Modern Data Science with R, 3rd edition by Baumer, Kaplan, and Horton
R for Data Science, 2nd edition by Wickham, Çetinkaya-Rundel, and Grolemund
geom_point()aes() functionwday.ggplot() function.aes() function.aes() functiony designation in the aes() function for the geom_bar() geometry?25
aes() function.y should be when not specified.y is specified in ggplot().y variable is the same as the x variable.fill = children and position = "fill"?26
fill = children colors and position = "fill" changes the y-axisfill = children changes the y-axis and position = "fill" colorsfill = children goes in the aes and position = "fill" goes outside the aesfill = children goes outside the aes and position = "fill" goes inside the aesfill = children and position = "fill" are two different ways to write the same thing.geom_bar() and geom_histogram()?[20c]
geom_bar() is for numbers and geom_histogram() is for categorical variables.geom_bar() is for categorical variables and geom_histogram() is for nubmers.geom_bar() produces counts and geom_histogram() produces percentages.geom_bar() produces percentages and geom_histogram() produces counts.| year | Algeria | Brazil | Columbia |
|---|---|---|---|
| 2000 | 7 | 12 | 16 |
| 2001 | 9 | 14 | 18 |
| country | Y2000 | Y2001 |
|---|---|---|
| Algeria | 7 | 9 |
| Brazil | 12 | 14 |
| Columbia | 16 | 18 |
| country | year | value |
|---|---|---|
| Algeria | 2000 | 7 |
| Algeria | 2001 | 9 |
| Brazil | 2000 | 12 |
| Brazil | 2001 | 14 |
| Columbia | 2000 | 16 |
| Columbia | 2001 | 18 |
Bakery should be upper casetype should not be in quotesstarbucks in wrong place#(a)
starbucks |>
group_by(type) |>
summarize(average_fat = mean(fat))
#(b)
group_by(starbucks, type) |>
summarize(average_fat = mean(fat))
#(c)
group_by(starbucks, type) |>
summarize(average_fat = sum(fat))
#(d)
temp <- group_by(starbucks, type)
summarize(temp, average_fat = mean(fat))
#(e)
summarize(group_by(starbucks, type),
average_fat = mean(fat))filter()arrange()select()mutate()group_by()(theme, price)(theme, year)(year, price)(pieces, year)(pieces, price)n_distinct(pieces)n_distinct(price)sum(pieces)sum(pages)mean(pieces)library(openintro)
lego_sample |>
filter(!is.na(minifigures)) |>
# keep only those with minifigures
group_by(theme, year) |>
# for each theme for each year
summarize(ave_pieces = mean(pieces))# A tibble: 9 × 3
# Groups: theme [3]
theme year ave_pieces
<chr> <dbl> <dbl>
1 City 2018 189.
2 City 2019 257.
3 City 2020 349
4 DUPLO® 2018 50.5
5 DUPLO® 2019 32.5
6 DUPLO® 2020 45.8
7 Friends 2018 354.
8 Friends 2019 259.
9 Friends 2020 250.
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 14.6
2 bistro box 18.4
3 hot breakfast 13.7
4 parfait 6.5
5 petite 9.33
6 salad 0
7 sandwich 14.7
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 14.6
2 bistro box 18.4
3 hot breakfast 13.7
4 parfait 6.5
5 petite 9.33
6 salad 0
7 sandwich 14.7
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 597
2 bistro box 147
3 hot breakfast 110.
4 parfait 19.5
5 petite 84
6 salad 0
7 sandwich 103
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 14.6
2 bistro box 18.4
3 hot breakfast 13.7
4 parfait 6.5
5 petite 9.33
6 salad 0
7 sandwich 14.7
# A tibble: 7 × 2
type average_fat
<fct> <dbl>
1 bakery 14.6
2 bistro box 18.4
3 hot breakfast 13.7
4 parfait 6.5
5 petite 9.33
6 salad 0
7 sandwich 14.7
gdpyeargdpvalcountry–countrygdpyeargdpvalcountry–countrygdpyeargdpvalcountry–countryMidterm score on the x-axis and Final score on the y-axis using the following ggplot() code. Which data frame should you use?39
pivot_wider() on raw datapivot_longer() on raw data# A tibble: 4 × 3
student test score
<chr> <chr> <dbl>
1 Alice Midterm 85
2 Alice Final 90
3 Bob Midterm 78
4 Bob Final 82
# A tibble: 2 × 3
student Midterm Final
<chr> <dbl> <dbl>
1 Alice 85 90
2 Bob 78 82
ggplot() code. Which data frame should you use?40
pivot_wider() on raw datapivot_longer() on raw data# A tibble: 18 × 11
Subject day_0 day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 308 250. 259. 251. 321. 357. 415. 382. 290. 431. 466.
2 309 223. 205. 203. 205. 208. 216. 214. 218. 224. 237.
3 310 199. 194. 234. 233. 229. 220. 235. 256. 261. 248.
4 330 322. 300. 284. 285. 286. 298. 280. 318. 305. 354.
5 331 288. 285 302. 320. 316. 293. 290. 335. 294. 372.
6 332 235. 243. 273. 310. 317. 310 454. 347. 330. 254.
7 333 284. 290. 277. 300. 297. 338. 332. 349. 333. 362.
8 334 265. 276. 243. 255. 279. 284. 306. 332. 336. 377.
9 335 242. 274. 254. 271. 251. 255. 245. 235. 236. 237.
10 337 312. 314. 292. 346. 366. 392. 404. 417. 456. 459.
11 349 236. 230. 239. 255. 251. 270. 282. 308. 336. 352.
12 350 256. 243. 256. 256. 269. 330. 379. 363. 394. 389.
13 351 251. 300. 270. 281. 272. 305. 288. 267. 322. 348.
14 352 222. 298. 327. 347. 349. 353. 354. 360. 376. 389.
15 369 272. 268. 257. 278. 315. 317. 298. 348. 340. 367.
16 370 225. 235. 239. 240. 268. 344. 281. 348. 365. 372.
17 371 270. 272. 278. 282. 279. 285. 259. 305. 351. 369.
18 372 269. 273. 298. 311. 287. 330. 334. 343. 369. 364.
sleep_long <- sleep_wide |>
pivot_longer(cols = -Subject,
names_to = "day",
names_prefix = "day_",
values_to = "reaction_time")
sleep_long# A tibble: 180 × 3
Subject day reaction_time
<dbl> <chr> <dbl>
1 308 0 250.
2 308 1 259.
3 308 2 251.
4 308 3 321.
5 308 4 357.
6 308 5 415.
7 308 6 382.
8 308 7 290.
9 308 8 431.
10 308 9 466.
# ℹ 170 more rows
right_join()?41right_join()?42namebandplaysplays variable in a full_join()?43NANULL students |> inner_join(classes, by = "student_id") |> filter(major != subject)classes |> anti_join(students, by = "student_id")students |> anti_join(classes, by = "student_id")students |> full_join(classes, by = "student_id")students |> semi_join(classes, by = "student_id")students |> inner_join(classes, by = "student_id") |> filter(major != subject)classes |> anti_join(students, by = "student_id")students |> anti_join(classes, by = "student_id")students |> full_join(classes, by = "student_id")students |> semi_join(classes, by = "student_id")students |> inner_join(classes, by = "student_id") |> filter(major != subject)classes |> anti_join(students, by = "student_id")students |> anti_join(classes, by = "student_id")students |> full_join(classes, by = "student_id")students |> semi_join(classes, by = "student_id")students |> inner_join(classes, by = "student_id") |> filter(major != subject)classes |> anti_join(students, by = "student_id")students |> anti_join(classes, by = "student_id")students |> full_join(classes, by = "student_id")students |> semi_join(classes, by = "student_id")students |> inner_join(classes, by = "student_id") |> filter(major != subject)classes |> anti_join(students, by = "student_id")students |> anti_join(classes, by = "student_id")students |> full_join(classes, by = "student_id")students |> semi_join(classes, by = "student_id")caloriestypetypetypetypefct_recode() do here?56
xxxstr_subset("q[^u]", very.large.word.list) would not match which of the following?62
"(?<=\\$)\\d""(?<=\\$)\\d+""\\d(?=\\$)""\\d+(?=\\$)""\\w+(?!pie)""\\w+(?! pie)""\\w+(?=pie)""\\w+(?= pie)"[1] "apple" "chocolate" "peach"
addTen() function. The following output is a result of which map_*() call?77map(c(1,4,7), addTen)map_dbl(c(1,4,7), addTen)map_chr(c(1,4,7), addTen)map_lgl(c(1,4,7), addTen)[1] "11.000000" "14.000000" "17.000000"
map(c(1, 4, 7), addTen)map(list(1, 4, 7), addTen)map(data.frame(a=1, b=4, c=7), addTen)map(c(1, 4, 7), addTen)map(c(1, 4, 7), ~addTen(.x))map(c(1, 4, 7), ~addTen)map(c(1, 4, 7), function(hi) (hi + 10))map(c(1, 4, 7), ~(.x + 10))jan31.months() is not a function.jan31.ymd() is not a function.jan31.ymd() is not a function. library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
jan31 <- ymd("2021-01-31")
jan31 + months(0:11) + days(31)
#> [1] "2021-03-03" NA "2021-05-01" NA "2021-07-01"
#> [6] NA "2021-08-31" "2021-10-01" NA "2021-12-01"
#> [11] NA "2022-01-31"ifelse() function takes the arguments:90set.seed() function94N(talent, 15)grades and SAT are to talent (bias?)replace = TRUE)replace = FALSE).#< >[ ]<img> (image) element?130
<img>.img#img[img]imghref= (URL) attribute?131
<href>href#href[href]hreftbl and R tibble both in storagetbl and R tibble both in memorytbl in storage and R tibble in memorytbl in memory and R tibble in storageSELECT Persons.FirstNameFROM PersonsSELECT FirstName FROM PersonsSELECT “FirstName” FROM “Persons”SELECT PersonsSELECT * FROM PersonsSELECT [all] FROM PersonsSELECT *.PersonsSELECT COLUMNS(*) FROM PersonsSELECT COUNT(*) FROM PersonsSELECT NO(*) FROM PersonsSELECT LEN(*) FROM PersonsSELECT * FROM Persons WHERE FirstName <> ‘Peter’SELECT * FROM Persons WHERE FirstName = ‘Peter’SELECT * FROM Persons WHERE FirstName == ‘Peter’SELECT * FROM Persons WHERE FirstName LIKE ‘Peter’SELECT [all] FROM Persons WHERE FirstName = ‘Peter’SELECT FirstName = ‘Peter’, LastName = ‘Jackson’ FROM PersonsSELECT * FROM Persons WHERE FirstName = Peter’ & LastName = Jackson’SELECT * FROM Persons WHERE FirstName = ‘Peter’ AND LastName = ‘Jackson’SELECT * FROM Persons WHERE FirstName = Peter’ | LastName = Jackson’BEWTEENWITHINRANGESELECT LastName > ‘Hansen’ AND LastName < ‘Pettersen’ FROM PersonsSELECT * FROM Persons WHERE LastName BETWEEN ‘Hansen’ AND ‘Pettersen’SELECT * FROM Persons WHERE LastName > ‘Hansen’ AND LastName < ‘Pettersen’SELECT UNIQUESELECT DISTINCTSELECT DIFFERENTORDER BYORDERSORTSORT BYSELECT and results comes after FROMFROM and results comes after WHEREWHERE and results comes after GROUP BYSELECTWHERESELECT * FROM Persons ORDER FirstName DESCSELECT * FROM Persons SORT ‘FirstName’ DESCSELECT * FROM Persons ORDER BY FirstName DESCSELECT * FROM Persons SORT BY ‘FirstName’ DESCSELECT the records with foods that are either green or yellow fruit:151
WHERE type = ‘fruit’ AND color = ‘yellow’ OR color = ‘green’WHERE (type = ‘fruit’ AND color = ‘yellow’) OR color = ‘green’WHERE type = ‘fruit’ AND (color = ‘yellow’ OR color = ‘green’)WHERE type = ‘fruit’ AND color = ‘yellow’ AND color = ‘green’WHERE type = ‘fruit’ AND (color = ‘yellow’ AND color = ‘green’)JOIN?152
SELECT statement.UNION operator in SQL?153
SELECT statements.SELECT statement.INNER JOIN in SQL?154
SELECT statement.LEFT JOIN in SQL?155
SELECT statement.RIGHT JOIN keeps all the rows in …?156
RIGHT JOIN?157
RIGHT JOIN?158
FULL JOIN?159
NULL